AITopics | perspective point

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate. To address this challenge, we propose to adopt perspective points as a new intermediate representation for 3D object detection, defined as the 2D projections of local Manhattan 3D keypoints to locate an object; these perspective points satisfy geometric constraints imposed by the perspective projection. We further devise PerspectiveNet, an end-to-end trainable model that simultaneously detects the 2D bounding box, 2D perspective points, and 3D object bounding box for each object from a single RGB image. PerspectiveNet yields three unique advantages: (i) 3D object bounding boxes are estimated based on perspective points, bridging the gap between 2D and 3D bounding boxes without the need of category-specific 3D shape priors.

perspective point, perspectivenet, single rgb image, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.82)

Add feedback

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Neural Information Processing SystemsAug-20-2025, 00:16:30 GMT

In particular, we tackle the challenging task of 3D object detection from a single RGB image.

computer vision, computer vision and pattern recognition, perspective point, (11 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > Canada (0.04)
Europe > Germany (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

representation for 3D object detection and the template-based prediction, as well as the significantly improved

Neural Information Processing SystemsAug-20-2025, 00:14:50 GMT

We'd like to express our gratitude towards all the reviewers who recognized the novelties of the proposed intermediate We further appreciate R3 for commenting that "predicting 3D properties by their projections is the right Are templates in the same class have different poses? R3: What would happen if the intermediate representation is class agnostic? Hence, the intermediate representation should be class-agnostic by Marr's theory. RGB-D dataset is imbalanced (rare objects in certain categories). R3: Is 3D bounding box branch necessary?

prediction, representation, template, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Reviews: PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Neural Information Processing SystemsJan-26-2025, 16:14:31 GMT

Originality: To the best of my knowledge, using projected 3D bounding box corners as an intermediate representation is a novel idea. Moreover, this is much more intuitive and natural compared to previous works. The related works are very well cited, making this paper more informative. Quality: The paper is technically sound. By introducing projected perspective points, this work achieves state of art 3D detection result on a challenging dataset. However, several ambiguities arise in the experiment section, which makes some important details less clear.

intermediate representation, representation, template, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback

Reviews: PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Neural Information Processing SystemsJan-26-2025, 16:14:20 GMT

The approach is sound and obtains good results.

object detection, perspective point, single rgb image, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Neural Information Processing SystemsOct-10-2024, 19:30:13 GMT

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate. To address this challenge, we propose to adopt perspective points as a new intermediate representation for 3D object detection, defined as the 2D projections of local Manhattan 3D keypoints to locate an object; these perspective points satisfy geometric constraints imposed by the perspective projection. We further devise PerspectiveNet, an end-to-end trainable model that simultaneously detects the 2D bounding box, 2D perspective points, and 3D object bounding box for each object from a single RGB image. PerspectiveNet yields three unique advantages: (i) 3D object bounding boxes are estimated based on perspective points, bridging the gap between 2D and 3D bounding boxes without the need of category-specific 3D shape priors. Experiments on SUN RGB-D dataset show that the proposed method significantly outperforms existing RGB-based approaches for 3D object detection.

perspective point, perspectivenet, single rgb image, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.88)

Add feedback

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Huang, Siyuan, Chen, Yixin, Yuan, Tao, Qi, Siyuan, Zhu, Yixin, Zhu, Song-Chun

Neural Information Processing SystemsMar-19-2020, 00:15:56 GMT

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate. To address this challenge, we propose to adopt perspective points as a new intermediate representation for 3D object detection, defined as the 2D projections of local Manhattan 3D keypoints to locate an object; these perspective points satisfy geometric constraints imposed by the perspective projection. We further devise PerspectiveNet, an end-to-end trainable model that simultaneously detects the 2D bounding box, 2D perspective points, and 3D object bounding box for each object from a single RGB image. PerspectiveNet yields three unique advantages: (i) 3D object bounding boxes are estimated based on perspective points, bridging the gap between 2D and 3D bounding boxes without the need of category-specific 3D shape priors. Experiments on SUN RGB-D dataset show that the proposed method significantly outperforms existing RGB-based approaches for 3D object detection.

perspective point, perspectivenet, single rgb image, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.89)

Add feedback

Filters

Collaborating Authors

perspective point

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

PerspectiveNet

b87517992f7dce71b674976b280257d2-AuthorFeedback.pdf

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

representation for 3D object detection and the template-based prediction, as well as the significantly improved

Reviews: PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Reviews: PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points